skip to main content


Search for: All records

Creators/Authors contains: "Diakopoulos, Nicholas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The scale of scientific publishing continues to grow, creating overload on science journalists who are inundated with choices for what would be most interesting, important, and newsworthy to cover in their reporting. Our work addresses this problem by considering the viability of creating a predictive model of newsworthiness of scientific articles that is trained using crowdsourced evaluations of newsworthiness. We proceed by first evaluating the potential of crowd-sourced evaluations of newsworthiness by assessing their alignment with expert ratings of newsworthiness, analyzing both quantitative correlations and qualitative rating rationale to understand limitations. We then demonstrate and evaluate a predictive model trained on these crowd ratings together with arXiv article metadata, text, and other computed features. Based on the crowdsourcing protocol we developed, we find that while crowdsourced ratings of newsworthiness often align moderately with expert ratings, there are also notable differences and divergences which limit the approach. Yet despite these limitations we also find that the predictive model we built provides a reasonably precise set of rankings when validated against expert evaluations (P@10 = 0.8, P@15 = 0.67), suggesting that a viable signal can be learned from crowdsourced evaluations of newsworthiness. Based on these findings we discuss opportunities for future work to leverage crowdsourcing and predictive approaches to support journalistic work in discovering and filtering newsworthy information. 
    more » « less
  2. Government use of algorithmic decision-making (ADM) systems is widespread and diverse, and holding these increasingly high-impact, often opaque government algorithms accountable presents a number of challenges. Some European governments have launched registries of ADM systems used in public services, and some transparency initiatives exist for algorithms in specific areas of the United States government; however, the U.S. lacks an overarching registry that catalogs algorithms in use for public-service delivery throughout the government. This paper conducts an inductive thematic analysis of over 700 government ADM systems cataloged by the Algorithm Tips database in an effort to describe the various ways government algorithms might be understood and inform downstream uses of such an algorithmic catalog. We describe the challenge of government algorithm accountability, the Algorithm Tips database and method for conducting a thematic analysis, and the themes of topics and issues, levels of sophistication, interfaces, and utilities of U.S. government algorithms that emerge. Through these themes, we contribute several different descriptions of government algorithm use across the U.S. and at federal, state, and local levels which can inform stakeholders such as journalists, members of civil society, or government policymakers 
    more » « less
  3. null (Ed.)
    Headlines play an important role in both news audiences' attention decisions online and in news organizations’ efforts to attract that attention. A large body of research focuses on developing generally applicable heuristics for more effective headline writing. In this work, we measure the importance of a number of theoretically motivated textual features to headline performance. Using a corpus of hundreds of thousands of headline A/B tests run by hundreds of news publishers, we develop and evaluate a machine-learned model to predict headline testing outcomes. We find that the model exhibits modest performance above baseline and further estimate an empirical upper bound for such content-based prediction in this domain, indicating an important role for non-content-based factors in test outcomes. Together, these results suggest that any particular headline writing approach has only a marginal impact, and that understanding reader behavior and headline context are key to predicting news attention decisions. 
    more » « less
  4. null (Ed.)
    This article explores how Twitter’s algorithmic timeline influences exposure to different types of external media. We use an agent-based testing method to compare chronological timelines and algorithmic timelines for a group of Twitter agents that emulated real-world archetypal users. We first find that algorithmic timelines exposed agents to external links at roughly half the rate of chronological timelines. Despite the reduced exposure, the proportional makeup of external links remained fairly stable in terms of source categories (major news brands, local news, new media, etc.). Notably, however, algorithmic timelines slightly increased the proportion of “junk news” websites in the external link exposures. While our descriptive evidence does not fully exonerate Twitter’s algorithm, it does characterize the algorithm as playing a fairly minor, supporting role in shifting media exposure for end users, especially considering upstream factors that create the algorithm’s input—factors such as human behavior, platform incentives, and content moderation. We conclude by contextualizing the algorithm within a complex system consisting of many factors that deserve future research attention. 
    more » « less
  5. Many journalists and newsrooms now incorporate audience contributions in their sourcing practices by leveraging user-generated content (UGC). However, their sourcing needs and practices as they seek information from UGCs are still not deeply understood by researchers or well-supported in tools. This paper first reports the results of a qualitative interview study with nine professional journalists about their UGC sourcing practices, detailing what journalists typically look for in UGCs and elaborating on two UGC sourcing approaches: deep reporting and wide reporting. These findings then inform a human-centered design approach to prototype a UGC sourcing tool for journalists, which enables journalists to interactively filter and rank UGCs based on users’ example content. We evaluate the prototype with nine professional journalists who source UGCs in their daily routines to understand how UGC sourcing practices are enabled and transformed, while also uncovering opportunities for future research and design to support journalistic sourcing practices and sensemaking processes. 
    more » « less
  6. null (Ed.)